Approximating K-means-type Clustering via Semidefinite Programming

نویسندگان

Jiming Peng

Yu Wei

چکیده

One of the fundamental clustering problems is to assign n points into k clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). We show that our 0-1 SDP model provides an unified framework for several clustering approaches such as normalized k-cut and spectral clustering. Moreover, the 0-1 SDP model allows us to solve the underlying problem approximately via the relaxed linear and semidefinite programming. Secondly, we consider the issue of how to extract a feasible solution of the original MSSC model from the approximate solution of the relaxed SDP problem. By using principal component analysis, we develop a rounding procedure to construct a feasible partitioning from a solution of the relaxed problem. In our rounding procedure, we need to solve a k-means clustering problem in <k−1, which can be solved in O(n 2(k−1)) time. In case of bi-clustering, the running time of our rounding procedure can be reduced to O(n log n). We show that our algorithm can provide a 2-approximate solution to the original problem. Promising numerical results based on our new method are reported.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advanced Optimization Laboratory Title: Approximating K-means-type clustering via semidefinite programming

متن کامل

Clustering subgaussian mixtures by semidefinite programming

We introduce a model-free relax-and-round algorithm for k-means clustering based on a semidefinite relaxation due to Peng and Wei [PW07]. The algorithm interprets the SDP output as a denoised version of the original data and then rounds this output to a hard clustering. We provide a generic method for proving performance guarantees for this algorithm, and we analyze the algorithm in the context...

متن کامل

A new theoretical framework for K-means-type clustering

متن کامل

Probably certifiably correct k-means clustering

Recently, Bandeira [5] introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k-means clustering. First, we prove that Peng and Wei’s semidefinite relaxation of k-means [20] is tight with high probability und...

متن کامل

Robustness of SDPs for Partial Recovery of Clustering Subgaussian Mixtures

In this paper, we examine the robustness of a relax-and-round k-means clustering procedure, a method for clustering subgaussian mixtures using semidefinite programming first introduced in [MVW16]. We are interested in the robustness of the algorithm when there is an adversarial corruption of N points each through distance at most R0. We show that under such corruption this specific algorithm we...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

SIAM Journal on Optimization

دوره 18 شماره

صفحات -

تاریخ انتشار 2007

Approximating K-means-type Clustering via Semidefinite Programming

نویسندگان

چکیده

منابع مشابه

Advanced Optimization Laboratory Title: Approximating K-means-type clustering via semidefinite programming

Clustering subgaussian mixtures by semidefinite programming

A new theoretical framework for K-means-type clustering

Probably certifiably correct k-means clustering

Robustness of SDPs for Partial Recovery of Clustering Subgaussian Mixtures

عنوان ژورنال:

اشتراک گذاری